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RAW SEQUENCE LISTING DATE: 01/15/2002 

PATENT APPLICATION: US/09/988 r 292 TIME: 18:58:37 

Input Set : N:\Crf3\RULE60\09988292.raw 
Output Set: N:\CRF3\01152002\I988292.raw 

SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Yu, Guo-Liang 
Rosen, Craig 

(ii) TITLE OF INVENTION: Colon Specific Genes and Proteins 
(iii) NUMBER OF SEQUENCES: 24 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Carella, Byrne, Bain, Gilfillan, Cecchi, 

Stewart & Olstein 

(B) STREET: 6 Becker Farm Road 

(C) CITY: Roseland 

(D) STATE: NJ 

(E) COUNTRY: USA 

(F) ZIP : 07068-1739 
(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 
(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US/09/988 , 292 

(B) FILING DATE: 19-NOV-2001 

(C) CLASSIFICATION: 
(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 09/224,110 

(B) FILING DATE: 
(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Ferraro, Gregory D. 

(B) REGISTRATION NUMBER: 36,134 

(C) REFERENCE/DOCKET NUMBER: 325800-435 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 201-994-1700 

(B) TELEFAX: 201-994-1744 
(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 638 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 
(ix) FEATURE: 

(A) NAME/KEY: mat^peptide 

(B) LOCATION: 1. .501 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .501 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
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68 

W--> 69 



GCC AGG CAG CTG GCT GCC SAC CAG GCC GTG TAT GTG AAG GTC AAG GCT 48 
Ala Arg Gin Leu Ala Ala Xaa Gin Ala Val Tyr Val Lys Val Lys Ala 

70 1 5 10 15 

72 GAA GCC CGG GAA CTG CTG GGC CAC CCG TGG TCT CTG TGT CCT GTC TGT 96 

73 Glu Ala Arg Glu Leu Leu Gly His Pro Trp Ser Leu Cys Pro Val Cys 

74 20 25 30 

76 GGG TGC CAA CTC ACC ACC TTT GAT GGG GCC CGT GGT GCC ACC ACT CTC 144 

77 Gly Cys Gin Leu Thr Thr Phe Asp Gly Ala Arg Gly Ala Thr Thr Leu 

78 35 40 45 

80 CTG GTG TCT ATG AAG CTC TCT TCC CGC TGC CCA GGA CTA CAG AAT ACC 192 

81 Leu Val Ser Met Lys Leu Ser Ser Arg Cys Pro Gly Leu Gin Asn Thr 

82 50 55 60 

84 ATC CCC TGG TAC CGT GTA GTT GCC GAA GTC CAG ATC TGC CAT GGC AAA 240 

85 He Pro Trp Tyr Arg Val Val Ala Glu Val Gin He Cys His Gly Lys 

86 65 70 75 80 

88 ACG GAG GCT GTG GGC CAG GTC CAC ATC TTC TTC CAG GAT GGG ATG GTG 288 

89 Thr Glu Ala Val Gly Gin Val His He Phe Phe Gin Asp Gly Met Val 

90 85 90 95 

92 ACG TTG ACT CCA AAC AAG GGT GTG TGG GTG AAT GGT CTC CGA GTG GAT 336 

93 Thr Leu Thr Pro Asn Lys Gly Val Trp Val Asn Gly Leu Arg Val Asp 

94 100 105 HO 

96 CTC CCA GCT GAG AAG TTA GCA TCT GTG TCC GTG AGT CGT ACA CCT GAT 384 

97 Leu Pro Ala Glu Lys Leu Ala Ser Val Ser Val Ser Arg Thr Pro Asp 

98 115 120 125 

100 GGC TCC CTG CTA GTC CGC CAG AAG GCA GGG GTC CAG GTG TGG CTT GGA 432 

101 Gly Ser Leu Leu Val Arg Gin Lys Ala Gly Val Gin Val Trp Leu Gly 

102 130 135 140 

104 GCC AAT GGG AAG GTG GCT GTG ATT GTG AGC AAT GAC CAT GCT GGG AAA 

105 Ala Asn Gly Lys Val Ala Val He Val Ser Asn Asp His Ala Glv Lvs 
"6 145 150 155 160 
108 CTG TGT GGG GGC CTK TGG AAA ATTTGACGGG GGACCAGACC AATGATTGGG 531 

W--> 109 Leu Cys Gly Gly Xaa Trp Lys 
HO 165 

112 ATGATTCCCA GGAGAAGCCA GCGATTGGGG AAWTGGAGAG CGCAGGGACT TTCTYCCMCA 591 
114 TGTTAATGGG CTTGWTCCAG TTCATCCCAC CAGGAACGAA GGATTTT 638 
117 (2) INFORMATION FOR SEQ ID NO: 2: 
H9 (i) SEQUENCE CHARACTERISTICS: 

120 (A) LENGTH: 167 amino acids 

121 (B) TYPE: amino acid 

122 (D) TOPOLOGY: linear 
124 (ii) MOLECULE TYPE: protein 

126 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

W--> 128 Ala Arg Gin Leu Ala Ala Xaa Gin Ala Val Tyr Val Lys Val Lys Ala 
129 15 10 15 

131 Glu Ala Arg Glu Leu Leu Gly His Pro Trp Ser Leu Cys Pro Val Cvs 

132 20 25 30 

134 Gly Cys Gin Leu Thr Thr Phe Asp Gly Ala Arg Gly Ala Thr Thr Leu 

135 35 40 45 

137 Leu val Ser Met Lys Leu Ser Ser Arg Cys Pro Gly Leu Gin Asn Thr 



480 
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138 50 55 60 

14 0 He Pro Trp Tyr Arg Val Val Ala Glu Val Gin He Cys His Gly Lys 

141 65 70 75 80 

143 Thr Glu Ala Val Gly Gin Val His He Phe Phe Gin Asp Gly Met Val 

144 85 90 95 

14 6 Thr Leu Thr Pro Asn Lys Gly Val Trp Val Asn Gly Leu Arg Val Asp 
147 100 105 110 

149 Leu Pro Ala Glu Lys Leu Ala Ser Val Ser Val Ser Arg Thr Pro Asp 

150 115 120 125 

152 Gly Ser Leu Leu Val Arg Gin Lys Ala Gly Val Gin Val Trp Leu Gly 

153 130 135 140 

155 Ala Asn Gly Lys Val Ala Val He Val Ser Asn Asp His Ala Gly Lys 

156 145 150 155 160 
* 158 Leu Cys Gly Gly Xaa Trp Lys 

159 165 

161 (2) INFORMATION FOR SEQ ID NO: 3: 

163 (i) SEQUENCE CHARACTERISTICS: 

164 (A) LENGTH: 874 base pairs 

165 (B) TYPE: nucleic acid 

166 (C) STRANDEDNESS : single 

167 (D) TOPOLOGY: linear 
169 (ii) MOLECULE TYPE: cDNA 

172 (ix) FEATURE: 

173 (A) NAME/KEY: CDS 

174 (B) LOCATION: 1..705 

176 (ix) FEATURE: 

177 (A) NAME/KEY: mat_peptide 

178 (B) LOCATION: 1. .705 

181 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: 

183 CAG GAC TGC GTG TGC ACG GAC AAG GTG GAC AAC AAC ACC CTG CTC AAC 48 

184 Gin Asp Cys Val Cys Thr Asp Lys Val Asp Asn Asn Thr Leu Leu Asn 

185 15 10 15 

187 GTC ATC GCC TGC ACC CAC GTG CCC TGC AAC ACC TCC TGC AGC CCT GGG 96 

188 Val He Ala Cys Thr His Val Pro Cys Asn Thr Ser Cys Ser Pro Gly 

189 20 25 30 

191 TTC GAA CTC ATG GAG GCC CCC GGG GAG TGC TGT AAG AAG TGT GAA CAG 144 

192 Phe Glu Leu Met Glu Ala Pro Gly Glu Cys Cys Lys Lys Cys Glu Gin 

193 35 40 45 

195 ACG CAC TGT ATC ATC AAA CGG CCC GAC AAC CAG CAC GTC ATC CTG AAG 192 

196 Thr His Cys He He Lys Arg Pro Asp Asn Gin His Val He Leu Lys 

197 50 55 60 

199 CCC GGG GAC TTC AAG AGC GAC CCG AAG AAC AAC TGC ACA TTC TTC AGC 240 

200 Pro Gly Asp Phe Lys Ser Asp Pro Lys Asn Asn Cys Thr Phe Phe Ser 

201 65 70 75 80 

203 TGC GTG AAG ATC CAC AAC CAG CTC ATC TCG TCC GTT TCC AAC ATC ACC 288 

204 Cys Val Lys He His Asn Gin Leu He Ser Ser Val Ser Asn He Thr 

205 85 90 95 

207 TGC CCC AAC TTT GAT GCC AGC ATT TGC ATC CCG GGC TCC ATC ACA TTC 336 

208 Cys Pro Asn Phe Asp Ala Ser He Cys He Pro Gly Ser He Thr Phe 
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209 100 105 

211 ATG CCC AAT GGA TGC TGC AAG ACC TGC ACC CCT 

212 Met Pro Asn Gly Cys Cys Lys Thr Cys Thr Pro 

213 115 120 

215 GTG CCC TGC TCC ACC GTC CCC GTC ACC ACG GAG 

216 Val Pro Cys Ser Thr Val Pro Val Thr Thr Glu 

217 130 135 

219 TGC ACC AAG ACC GTC CTC ATG AAT CAT TGC TCC 

220 Cys Thr Lys Thr Val Leu Met Asn His Cys Ser 

221 145 150 155 

223 TTT GTC ATG TAC TCG GCC AAG GCC CAG GCC CTG 

224 Phe Val Met Tyr Ser Ala Lys Ala Gin Ala Leu 

225 165 170 

227 TGC TGC AAA GAG GAG AAA ACC AGC CAG CGT GAG 

228 Cys Cys Lys Glu Glu Lys Thr Ser Gin Arg Glu 

229 180 185 

231 CCC AAT GGC GGC TCG CTG ACA CAC ACC TAC ACC 

232 Pro Asn Gly Gly Ser Leu Thr His Thr Tyr Thr 

233 195 200 

235 CAG TGC CAG GAC ACC GTC TGC GGG CTC CCC ACC 

236 Gin Cys Gin Asp Thr Val Cys Gly Leu Pro Thr 

237 210 215 

239 GCC CGG CGT TCC CCT AGG CAT CTG GGG AGC GGG 

240 Ala Arg Arg Ser Pro Arg His Leu Gly Ser Gly 

241 225 230 235 
243 CCTTCACTGC CCTCGACAGC TTTACCTCCC CCGGACCCTC 
24 5 TCCTCTCTTC AGATATTTAT TGTCTGAGTT TTTGTTCAGT 
24 7 CTCAGGGGGA CATGCAAAAA AAAAAAAAA 

250 (2) INFORMATION FOR SEQ ID NO : 4: 



110 

CGC AAT GAG ACC AGG 
Arg Asn Glu Thr Arg 
125 

GTT TCG TAC GCC GGC 
Val Ser Tyr Ala Gly 
140 

GGG TCC TGC GGG ACA 
Gly Ser Cys Gly Thr 
160 

GAC CAC AGC TGC TCC 
Asp His Ser Cys Ser 
175 

GTG GTC CTG AGC TGC 
Val Val Leu Ser Cys 
190 

CAC ATC GAG AGC TGC 
His lie Glu Ser Cys 
205 

GGC ACC TCC CGC CGG 
Gly Thr Ser Arg Arg 
220 

TGAGCGGGGT GGGCACAGCC 



TGAGCCTCCT AAGCTCGGCT 
CCTTGCTTTC CAATAATAAA 



252 
253 
254 
255 
257 
259 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 235 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 

261 Gin Asp Cys Val Cys Thr Asp Lys Val Asp Asn 

262 1 5 10 

2 64 Val He Ala Cys Thr His Val Pro Cys Asn Thr 
265 20 25 

267 Phe Glu Leu Met Glu Ala Pro Gly Glu Cys Cys 

268 35 40 

270 Thr His Cys He He Lys Arg Pro Asp Asn Gin 

271 50 55 

273 Pro Gly Asp Phe Lys Ser Asp Pro Lys Asn Asn 

274 65 70 75 

276 Cys Val Lys He His Asn Gin Leu He Ser Ser 

277 85 90 

279 Cys Pro Asn Phe Asp Ala Ser He Cys He Pro 

280 100 105 



Asn Thr Leu 

Ser Cys Ser 
30 

Lys Lys Cys 
45 

His Val He 
60 

Cys Thr Phe 

Val Ser Asn 

Gly Ser He 
110 



Leu Asn 

15 
Pro Gly 

Glu Gin 

Leu Lys 

Phe Ser 
80 

He Thr 
95 

Thr Phe 



384 



432 



480 



528 



576 



624 



672 



725 



785 
845 
874 
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Cys 


Cys 


Lys 


Thr 


Cys 


Thr 


Pro 


Arg 


Asn 


Glu 


Thr 


Arg 








ion 










125 






inr 


vai 


Pro 


Val 


Thr 


Thr 


Glu 


Val 


Ser 


Tyr 


Ala 


Gly 






135 










140 








Val 


Leu 


Met 


Asn 


His 


Cys 


Ser 


Gly 


Ser Cys 


Gly Thr 




150 










155 










160 


Ser 


Ala 


Lys 


Ala 


Gin 


Ala 


Leu 


Asp 


His 


Ser 


Cys 


Ser 


165 










170 










175 




Glu 


Lys 


Thr 


Ser 


Gin 


Arg 


Glu 


Val 


Val 


Leu 


Ser 


Cys 










185 










190 




Ser 


Leu 


Thr 


His 


Thr 


Tyr 


Thr 


His 


He 


Glu 


Ser 


Cys 








200 










205 






Thr 


Val 


Cys 


Gly 


Leu 


Pro 


Thr 


Gly 


Thr 


Ser 


Arg 


Arg 






215 










220 






Pro 


Arg 
230 


His 


Leu 


Gly 


Ser 


Gly 
235 
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282 Met Pro Asn Gly 

283 115 

285 Val Pro Cys Ser 

286 130 

288 Cys Thr Lys Thr 

289 145 

291 Phe Val Met Tyr 
292 

294 Cys Cys Lys Glu 

295 180 

297 Pro Asn Gly Gly 

298 195 

300 Gin Cys Gin Asp 

301 210 

303 Ala Arg Arg Ser 

304 225 

306 (2) INFORMATION FOR SEQ ID NO: 5: 

308 (i) SEQUENCE CHARACTERISTICS: 

309 (A) LENGTH: 1209 base pairs 

310 (B) TYPE: nucleic acid 

311 (C) STRANDEDNESS : single 

312 (D) TOPOLOGY: linear 
314 (ii) MOLECULE TYPE: cDNA 

319 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5: 

321 ATTGGTGCTA CCTGGCTCTC CTGTCTCTGC AGCTCTACAG GTGAGGCCCA GCAGAGGGAG 60 
323 TAGGGCTCGC CATGTTTCTG GTGAGCCAAT TTGGCTGATC TTGGGTGTCT GAACAGCTAT 120 
325 TGGGTCCACC CCAGTCCCTT TCAGCTGCTG CTTAATGCCC TGCTCTCTCC CTGGCCCACC 180 
327 T TAT AG AG AG CCCAAAGAGC TCCTGTAAGA GGGAGAACTC TATCTGTGGT TTATAATCTT 240 
329 GCACGAGGCA CCAGAAGTCT CCCTGGGTCT TGTGAATGAA CTACATTTAT CCCCTTTCCT 300 
331 GCCCCAACCA CAAACTCTTT CCTTCAAAGA GGGCCTGCCT GGTTCCCTCC ACCCAACTGC 360 
333 ACCATGAGAT CGGTCCAAGA GTCCATTCCC CAGGTGGGAG CCAACTGTCA GGGAGGTCTT 420 
3 35 TCCCACCAAA CATCTTTCAG TTGCTGGGAG GTGACCATAG GGCTCTGCTT TTAAAGATAT 4 80 

337 GGCTGCTTCA AAGGCCAGAG TCACAGGAAG GACTTCTTCC AGGGAGATTA GTGGTGATGG 540 
339 AGAGGAGAGT TAAAATGACC TCATGTCCTT CTTGTCCACG GTTTTGTTGA GTTTTCACTC 600 
341 TTCTAATGCA AGGGTCTCAC ACTGTGAACC ACTTAGGATG TGATCACTTT CAGGTGGCCA 660 
343 GGAATGTTGA ATGTCTTTGG CTCAGTTCAT CTAAAAAAGA TATCTATTTG AAAGTTCTCA 720 
345 GAGTTGTACA TATGTTTCAC AGTACAGGAT CTGTACATAA AAGTTTCTTT CCTAAACCAT 780 
347 TCACCAAGAG CCAATATCTA GGCATTTCCT CGGTAGCACA AATTTTCTNA TTGCTTAGAA 840 
349 AATTGTCCTC CCTGTTCTTT CTGTCTGNAG ACTTAAGTGA GTTAGGTCTT TAAGGAAAGC 900 
351 AACGCTCCTC TGAAATGCTT GTCTTTTTTC TGTTGCCGAA ATAGCTGGTC CTTTTTCGGG 960 
353 AGTTAGATGT ATAGAGTGTT TGTATGTAAA CATTTCTTGT AGGCATCACC ATGAACANAG 1020 
355 ATATATTTTC TATTTANTTA NTATATGTGC ACTTCAAGAA GTCACTGTCA GAGAAATAAA 1080 
357 GAATTGTCTT AAATGTCATG ATTGGAGATG TCCTTTGCAT TGCTTGGAAG GGGTGTACCT 1140 
359 AGAGCCAAGG AAATTGGCTC TGGTTTGGAA AAATTTTGCT GTTATTATAG TAAACATACA 1200 
361 AAGGATGTC 12 ^ 
363 (2) INFORMATION FOR SEQ ID NO: 6: 

365 (i) SEQUENCE CHARACTERISTICS: 

366 (A) LENGTH: 548 base pairs 

367 (B) TYPE: nucleic acid 
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VERIFICATION SUMMARY DATE: 01/15/2002 

PATENT APPLICATION: US/09/988,292 TIME: 18:58:38 



Input Set : N:\Crf3\RULE60\09988292.raw 
Output Set: N:\CRF3\01152002\I988292.raw 

L:28 M:220 C: Keyword misspelled or invalid format, [(A) APPLICATION NUMBER : ] 

L:29 M:220 C: Keyword misspelled or invalid format, [(B) FILING DATE:] 

L:69 M:341 W: (46) "n" or "Xaa" used, for SEQ ID# : 1 

L:109 M:341 W: (46) "n" or "Xaa" used, for SEQ ID#:1 

L:128 M:341 W: (46) "n" or "Xaa" used, for SEQ ID# : 2 

L:158 M:341 W: (46) "n" or "Xaa" used, for SEQ ID# : 2 
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